Skip to content

Conversation

gmalasan
Copy link
Contributor

As discussed before, this PR adds the basic infrastructure/boiler plate for loop ordering strategies to be implemented.

If this looks ok, I wanted to also mention some of the heuristics that I would implement next, if they sound reasonable to you guys:

  • Parallel first : prioritize parallel loops over reduction loops
  • Dense outer : prioritize the most dense loops first
  • Sparse outer : the opposite, potentially useful in some cases?

There is another that I am considering, stride/memory aware, which would prioritize loops with better stride patterns (like sequential or linear). Not sure how well this carries over to Sparse Tensor though. Are there any ideas/heuristics that I should definitely try to implement?

As we discussed, I will try to incrementally add heuristics. Sorry for the delay on my end, and thank you so much for the feedback!

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added mlir:sparse Sparse compiler in MLIR mlir labels Aug 21, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 21, 2025

@llvm/pr-subscribers-mlir-sparse

Author: Govind Malasani (gmalasan)

Changes

As discussed before, this PR adds the basic infrastructure/boiler plate for loop ordering strategies to be implemented.

If this looks ok, I wanted to also mention some of the heuristics that I would implement next, if they sound reasonable to you guys:

  • Parallel first : prioritize parallel loops over reduction loops
  • Dense outer : prioritize the most dense loops first
  • Sparse outer : the opposite, potentially useful in some cases?

There is another that I am considering, stride/memory aware, which would prioritize loops with better stride patterns (like sequential or linear). Not sure how well this carries over to Sparse Tensor though. Are there any ideas/heuristics that I should definitely try to implement?

As we discussed, I will try to incrementally add heuristics. Sorry for the delay on my end, and thank you so much for the feedback!


Full diff: https://github.com/llvm/llvm-project/pull/154656.diff

4 Files Affected:

  • (modified) mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h (+13-1)
  • (modified) mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td (+5)
  • (modified) mlir/lib/Dialect/SparseTensor/Transforms/SparseReinterpretMap.cpp (+4-1)
  • (modified) mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp (+11-1)
diff --git a/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h b/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h
index 212f7b6f13c26..24f30353e58b5 100644
--- a/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h
+++ b/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h
@@ -55,6 +55,15 @@ enum class SparseEmitStrategy {
   kDebugInterface, // generate only place-holder for sparse iteration
 };
 
+namespace sparse_tensor {
+
+/// Selects between different loop ordering strategies for sparse tensor 
+enum class LoopOrderingStrategy : unsigned {
+  kDefault, ///< Default strategy (current behavior)
+};
+
+} // namespace sparse_tensor
+
 #define GEN_PASS_DECL
 #include "mlir/Dialect/SparseTensor/Transforms/Passes.h.inc"
 
@@ -72,10 +81,13 @@ std::unique_ptr<Pass> createSparseAssembler(bool directOut);
 //===----------------------------------------------------------------------===//
 
 void populateSparseReinterpretMap(RewritePatternSet &patterns,
-                                  ReinterpretMapScope scope);
+                                  ReinterpretMapScope scope,
+                                  sparse_tensor::LoopOrderingStrategy strategy = sparse_tensor::LoopOrderingStrategy::kDefault);
 
 std::unique_ptr<Pass> createSparseReinterpretMapPass();
 std::unique_ptr<Pass> createSparseReinterpretMapPass(ReinterpretMapScope scope);
+std::unique_ptr<Pass> createSparseReinterpretMapPass(ReinterpretMapScope scope, 
+                                                     sparse_tensor::LoopOrderingStrategy strategy);
 
 //===----------------------------------------------------------------------===//
 // The PreSparsificationRewriting pass.
diff --git a/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td b/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td
index 2513e106f5b06..1b0c7f7583827 100644
--- a/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td
@@ -81,6 +81,11 @@ def SparseReinterpretMap : Pass<"sparse-reinterpret-map", "ModuleOp"> {
          clEnumValN(mlir::ReinterpretMapScope::kExceptGeneric,
                     "except-generic",
                     "Run on operations expect linalg.generic (e.g., foreach)"))}]>,
+    Option<"loopOrderingStrategy", "loop-ordering-strategy", "mlir::sparse_tensor::LoopOrderingStrategy",
+       "mlir::sparse_tensor::LoopOrderingStrategy::kDefault",
+       "Set the loop ordering strategy for sparse tensor dialect", [{llvm::cl::values(
+         clEnumValN(mlir::sparse_tensor::LoopOrderingStrategy::kDefault, "default",
+                    "Default strategy (current behavior)"))}]>,
   ];
 }
 
diff --git a/mlir/lib/Dialect/SparseTensor/Transforms/SparseReinterpretMap.cpp b/mlir/lib/Dialect/SparseTensor/Transforms/SparseReinterpretMap.cpp
index df9b6cf040efa..06c58d2b13cf2 100644
--- a/mlir/lib/Dialect/SparseTensor/Transforms/SparseReinterpretMap.cpp
+++ b/mlir/lib/Dialect/SparseTensor/Transforms/SparseReinterpretMap.cpp
@@ -788,7 +788,10 @@ struct ForeachOpDemapper
 } // namespace
 
 void mlir::populateSparseReinterpretMap(RewritePatternSet &patterns,
-                                        ReinterpretMapScope scope) {
+                                        ReinterpretMapScope scope,
+                                        sparse_tensor::LoopOrderingStrategy strategy) {
+  (void)strategy; // Suppress unused parameter warning
+  
   if (scope == ReinterpretMapScope::kAll ||
       scope == ReinterpretMapScope::kGenericOnly) {
     patterns.add<GenericOpReinterpretMap, GenericOpScheduler>(
diff --git a/mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp b/mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp
index 153b9b170e5d3..aa31927e0602c 100644
--- a/mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp
+++ b/mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp
@@ -67,12 +67,13 @@ struct SparseReinterpretMap
   SparseReinterpretMap(const SparseReinterpretMap &pass) = default;
   SparseReinterpretMap(const SparseReinterpretMapOptions &options) {
     scope = options.scope;
+    loopOrderingStrategy = options.loopOrderingStrategy;
   }
 
   void runOnOperation() override {
     auto *ctx = &getContext();
     RewritePatternSet patterns(ctx);
-    populateSparseReinterpretMap(patterns, scope);
+    populateSparseReinterpretMap(patterns, scope, loopOrderingStrategy);
     (void)applyPatternsGreedily(getOperation(), std::move(patterns));
   }
 };
@@ -438,6 +439,15 @@ mlir::createSparseReinterpretMapPass(ReinterpretMapScope scope) {
   return std::make_unique<SparseReinterpretMap>(options);
 }
 
+std::unique_ptr<Pass>
+mlir::createSparseReinterpretMapPass(ReinterpretMapScope scope, 
+                                     sparse_tensor::LoopOrderingStrategy strategy) {
+  SparseReinterpretMapOptions options;
+  options.scope = scope;
+  options.loopOrderingStrategy = strategy;
+  return std::make_unique<SparseReinterpretMap>(options);
+}
+
 std::unique_ptr<Pass> mlir::createPreSparsificationRewritePass() {
   return std::make_unique<PreSparsificationRewritePass>();
 }

@llvmbot
Copy link
Member

llvmbot commented Aug 21, 2025

@llvm/pr-subscribers-mlir

Author: Govind Malasani (gmalasan)

Changes

As discussed before, this PR adds the basic infrastructure/boiler plate for loop ordering strategies to be implemented.

If this looks ok, I wanted to also mention some of the heuristics that I would implement next, if they sound reasonable to you guys:

  • Parallel first : prioritize parallel loops over reduction loops
  • Dense outer : prioritize the most dense loops first
  • Sparse outer : the opposite, potentially useful in some cases?

There is another that I am considering, stride/memory aware, which would prioritize loops with better stride patterns (like sequential or linear). Not sure how well this carries over to Sparse Tensor though. Are there any ideas/heuristics that I should definitely try to implement?

As we discussed, I will try to incrementally add heuristics. Sorry for the delay on my end, and thank you so much for the feedback!


Full diff: https://github.com/llvm/llvm-project/pull/154656.diff

4 Files Affected:

  • (modified) mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h (+13-1)
  • (modified) mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td (+5)
  • (modified) mlir/lib/Dialect/SparseTensor/Transforms/SparseReinterpretMap.cpp (+4-1)
  • (modified) mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp (+11-1)
diff --git a/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h b/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h
index 212f7b6f13c26..24f30353e58b5 100644
--- a/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h
+++ b/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h
@@ -55,6 +55,15 @@ enum class SparseEmitStrategy {
   kDebugInterface, // generate only place-holder for sparse iteration
 };
 
+namespace sparse_tensor {
+
+/// Selects between different loop ordering strategies for sparse tensor 
+enum class LoopOrderingStrategy : unsigned {
+  kDefault, ///< Default strategy (current behavior)
+};
+
+} // namespace sparse_tensor
+
 #define GEN_PASS_DECL
 #include "mlir/Dialect/SparseTensor/Transforms/Passes.h.inc"
 
@@ -72,10 +81,13 @@ std::unique_ptr<Pass> createSparseAssembler(bool directOut);
 //===----------------------------------------------------------------------===//
 
 void populateSparseReinterpretMap(RewritePatternSet &patterns,
-                                  ReinterpretMapScope scope);
+                                  ReinterpretMapScope scope,
+                                  sparse_tensor::LoopOrderingStrategy strategy = sparse_tensor::LoopOrderingStrategy::kDefault);
 
 std::unique_ptr<Pass> createSparseReinterpretMapPass();
 std::unique_ptr<Pass> createSparseReinterpretMapPass(ReinterpretMapScope scope);
+std::unique_ptr<Pass> createSparseReinterpretMapPass(ReinterpretMapScope scope, 
+                                                     sparse_tensor::LoopOrderingStrategy strategy);
 
 //===----------------------------------------------------------------------===//
 // The PreSparsificationRewriting pass.
diff --git a/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td b/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td
index 2513e106f5b06..1b0c7f7583827 100644
--- a/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td
@@ -81,6 +81,11 @@ def SparseReinterpretMap : Pass<"sparse-reinterpret-map", "ModuleOp"> {
          clEnumValN(mlir::ReinterpretMapScope::kExceptGeneric,
                     "except-generic",
                     "Run on operations expect linalg.generic (e.g., foreach)"))}]>,
+    Option<"loopOrderingStrategy", "loop-ordering-strategy", "mlir::sparse_tensor::LoopOrderingStrategy",
+       "mlir::sparse_tensor::LoopOrderingStrategy::kDefault",
+       "Set the loop ordering strategy for sparse tensor dialect", [{llvm::cl::values(
+         clEnumValN(mlir::sparse_tensor::LoopOrderingStrategy::kDefault, "default",
+                    "Default strategy (current behavior)"))}]>,
   ];
 }
 
diff --git a/mlir/lib/Dialect/SparseTensor/Transforms/SparseReinterpretMap.cpp b/mlir/lib/Dialect/SparseTensor/Transforms/SparseReinterpretMap.cpp
index df9b6cf040efa..06c58d2b13cf2 100644
--- a/mlir/lib/Dialect/SparseTensor/Transforms/SparseReinterpretMap.cpp
+++ b/mlir/lib/Dialect/SparseTensor/Transforms/SparseReinterpretMap.cpp
@@ -788,7 +788,10 @@ struct ForeachOpDemapper
 } // namespace
 
 void mlir::populateSparseReinterpretMap(RewritePatternSet &patterns,
-                                        ReinterpretMapScope scope) {
+                                        ReinterpretMapScope scope,
+                                        sparse_tensor::LoopOrderingStrategy strategy) {
+  (void)strategy; // Suppress unused parameter warning
+  
   if (scope == ReinterpretMapScope::kAll ||
       scope == ReinterpretMapScope::kGenericOnly) {
     patterns.add<GenericOpReinterpretMap, GenericOpScheduler>(
diff --git a/mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp b/mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp
index 153b9b170e5d3..aa31927e0602c 100644
--- a/mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp
+++ b/mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp
@@ -67,12 +67,13 @@ struct SparseReinterpretMap
   SparseReinterpretMap(const SparseReinterpretMap &pass) = default;
   SparseReinterpretMap(const SparseReinterpretMapOptions &options) {
     scope = options.scope;
+    loopOrderingStrategy = options.loopOrderingStrategy;
   }
 
   void runOnOperation() override {
     auto *ctx = &getContext();
     RewritePatternSet patterns(ctx);
-    populateSparseReinterpretMap(patterns, scope);
+    populateSparseReinterpretMap(patterns, scope, loopOrderingStrategy);
     (void)applyPatternsGreedily(getOperation(), std::move(patterns));
   }
 };
@@ -438,6 +439,15 @@ mlir::createSparseReinterpretMapPass(ReinterpretMapScope scope) {
   return std::make_unique<SparseReinterpretMap>(options);
 }
 
+std::unique_ptr<Pass>
+mlir::createSparseReinterpretMapPass(ReinterpretMapScope scope, 
+                                     sparse_tensor::LoopOrderingStrategy strategy) {
+  SparseReinterpretMapOptions options;
+  options.scope = scope;
+  options.loopOrderingStrategy = strategy;
+  return std::make_unique<SparseReinterpretMap>(options);
+}
+
 std::unique_ptr<Pass> mlir::createPreSparsificationRewritePass() {
   return std::make_unique<PreSparsificationRewritePass>();
 }

…egy fully available in the topoSort method in IterationGraphSorter
@gmalasan
Copy link
Contributor Author

Hey @aartbik, thank you so much for the feedback! I believe I've addressed the issues you pointed out, I'd love to know if there's anything else I should fix.

Otherwise, I'd like to ask if what I mentioned earlier for my next steps make sense, implementing a parallel first, dense outer, and a sparse outer heuristic.

I would love to hear your thoughts about these strategies, or honestly if there are any heuristics you think are worth implementing. Thanks again!

Copy link
Contributor

@aartbik aartbik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two last nits.

as for next steps, once this PR goes in, please start incrementally adding new strategies, but make sure to provide evidence of usefulness of each, and avoid hard coding values that are very specific for the experiments you conducted (or at least document that properly)

@aartbik
Copy link
Contributor

aartbik commented Sep 8, 2025

@gmalasan did you see my comments? of course, take your time, but given the silence after the review I was wondering if you got the notice?

@gmalasan
Copy link
Contributor Author

gmalasan commented Sep 8, 2025

Sorry for the delay, and thanks for the reminder, I just made the fixes!

I also wanted to ask what evidence I should provide for the usefulness of each heuristic. What kinds of experiments are good to do? Benchmarking with various loop orderings? And is there any infrastructure that already exists for benchmarking this kind of thing? Or somewhere I can check for more help?

Thanks again for all the help!

@aartbik
Copy link
Contributor

aartbik commented Sep 9, 2025

I also wanted to ask what evidence I should provide for the usefulness of each heuristic. What kinds of experiments are good to do? Benchmarking with various loop orderings? And is there any infrastructure that already exists for benchmarking this kind of thing? Or somewhere I can check for more help?

There is no good single answer for this. MLIR is a compiler infrastructure, and not a specific compiler for a specific target architecture. So in that sense, we should not fine-tune too much for specific CPUs or GPUs. Having said that, in your previous PR you had a lot of hard-coded constants that, I assume, were the result of tuning for a particular target. Let's avoid that but instead document the reasoning for heuristics, like "if there are more reads than writes it is beneficial to.... because...." but not "if (s > 0.2313) heuristic1". It is okay to benchmark specific examples, and validate and even document the outcome. But eventually the code should reflect some reasoning others can follow. Does that make sense?

@gmalasan
Copy link
Contributor Author

gmalasan commented Sep 22, 2025

Thanks for the advice. Sorry for the delay in implementing heuristics on my end. I tried implementing a generic heuristic for preferring dense loops outer and sparse loops/singleton loops inner. Does this heuristic make sense? And as always is there anything I should fix or look out for? I avoided completely using any magic constants.

@aartbik
Copy link
Contributor

aartbik commented Sep 22, 2025

Wait, are you continuing with new development in this PR? The idea was that we reviewed and approved the flag change, so that can go into main. After that, you send out new PRs (with much smaller deltas) with the new heuristics that we discuss in turn then.

If you just build on top of this PR we again get very huge differences and I have to constantly filter what is new and old?

@aartbik aartbik self-requested a review September 22, 2025 16:18
@gmalasan gmalasan force-pushed the sparse-tensor-loop-ordering-infrastructure branch from bd92633 to 79ee468 Compare September 22, 2025 18:23
@gmalasan
Copy link
Contributor Author

Sorry about that, I reset this branch to the last state that you approved and I'm opening a new PR for this heuristic. Thank you for letting me know the process, it definitely makes sense not to do so much in one PR.

Copy link
Contributor

@aartbik aartbik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, yes, please submit this and then make a new PR for the first new heuristic, which we will discuss there!

@gmalasan
Copy link
Contributor Author

Hi @aartbik, sorry, I'm a little confused on what you meant by submitting this. I'm seeing that there are two workflows awaiting approval from a maintainer, but I'm not sure if there's anything else I can do. Or is there a way for me to merge the changes into main (I'm assuming not)?

Copy link

github-actions bot commented Sep 23, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@aartbik
Copy link
Contributor

aartbik commented Sep 25, 2025

Please run the formatter to fix the style issues

@gmalasan
Copy link
Contributor Author

I believe the formatting issues are now fixed.

@aartbik aartbik enabled auto-merge (squash) September 30, 2025 17:00
@aartbik aartbik merged commit 95e0ae9 into llvm:main Oct 6, 2025
9 checks passed
Copy link

github-actions bot commented Oct 6, 2025

@gmalasan Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@gmalasan gmalasan deleted the sparse-tensor-loop-ordering-infrastructure branch October 12, 2025 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mlir:sparse Sparse compiler in MLIR mlir

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants